Heavy-Tailed Symmetric Stochastic Neighbor Embedding

نویسندگان

  • Zhirong Yang
  • Irwin King
  • Zenglin Xu
  • Erkki Oja
چکیده

Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavytailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heavy-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a fixed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic approximation with long range dependent and heavy tailed noise

Stability and convergence properties of stochastic approximation algorithms are analyzed when the noise includes a long range dependent component (modeled by a fractional Brownian motion) and a heavy tailed component (modeled by a symmetric stable process), in addition to the usual ‘martingale noise’. This is motivated by the emergent applications in communications. The proofs are based on comp...

متن کامل

Stochastic approximation with 'bad' noise

Stability and convergence properties of stochastic approximation algorithms are analyzed when the noise includes a long range dependent component (modeled by a fractional Brownian motion) and a heavy tailed component (modeled by a symmetric stable process), in addition to the usual ‘martingale noise’. This is motivated by the emergent applications in communications. The proofs are based on comp...

متن کامل

Robust Bayesian analysis of heavy-tailed stochastic volatility models using scale mixtures of normal distributions

A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian para...

متن کامل

Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences

We present a systematic approach to the mathematical treatment of the t-distributed stochastic neighbor embedding (t-SNE) and the stochastic neighbor embedding (SNE) method. This allows an easy adaptation of the methods or exchange of their respective modules. In particular, the divergence which measures the difference between probability distributions in the original and the embedding space ca...

متن کامل

Heavy-Tailed Process Priors for Selective Shrinkage

Heavy-tailed distributions are often used to enhance the robustness of regression and classification methods to outliers in output space. Often, however, we are confronted with “outliers” in input space, which are isolated observations in sparsely populated regions. We show that heavy-tailed stochastic processes (which we construct from Gaussian processes via a copula), can be used to improve r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009